- Title
- The multi-objective approach to solve the (alpha, beta)-k feature set problem using memetic algorithms
- Creator
- Jimenez, Francia
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2019
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- In many application areas, the decision-making process is enhanced by the information obtained from analyzing data. In fact, the process of improving digital products and services can be driven by insights from understanding complex relationships inside the data. Commonly to have a complete picture of the process, the data is obtained from multiple sources. Each source stores different type of data that it is essential for the specific data source. However, when we aggregate different sources, the new data can have some elements that can be considered as unreliable, irrelevant, or redundant for a specific problem. The previous challenge is known as Feature Selection (FS) and commonly presented during data integration. The k-Feature Set Problem (k-FS) is a problem in FS, that aims to find the minimum subset of features necessary to describe a dataset. Similarly, the (α; β)-k-Feature Set Problem (ABkFS) also aims to find the minimum subset of features, but in addition the subset of features needs to satisfy two conditions: α and β, where the α value is related with the differentiation power and the β value is related with the representation power of the subset of features. Commonly the ABkFS is used to reduce the number of features on datasets where the number of features is higher than the number of samples. This type of datasets can be found in bioinformatics where a few numbers of samples (e.g. corresponding to a set of biological samples obtained from individuals/patients) have their gene expression (features) measured in a quest to characterize a specific disease. In the literature, state-of-the-art feature selection techniques do not report good performance when analyzing this type of dataset because they use univariate tests which are commonly based on statistical measures across the samples. Currently, the ABkFS has been solved with exact models and also heuristics have been employed only based on single objective approach. However, there is a need to consider a multi-objective approach since the minimization of the number of features (usually required to achieve better generalization) “conspires” against the requirements of having a large value of α and β. This then constitutes a typical scenario in which the multi-objective approach is the most natural alternative. Many engineering solutions are developed using optimization techniques where we formally define an optimization problem which is composed by an objective function (or metric of interest) that we will optimize (minimize or maximize). A more realistic strategy of modeling optimization problems is assessing many objectives simultaneously, formally known as Multi-objective optimization problems (MOPs), where the main goal is to optimize multiple and possibly conflicting objectives. The conflict between two objectives functions is when improving the value of one of them worsen the second one. A special type of algorithms has been developed to solve MOP which are known as Multi-objective optimization algorithms (MOA). As a result of this type of algorithms, we have a set of solutions that between them we can not establish which one is better, and the set represents the tradeoff that exists between the objectives that are being optimized. In the literature, these multi-objective techniques are generating good results in a variety of complex problems. Commonly, multi-objective techniques are used to implement wrapper feature selection approaches. Therefore, developing a multi-objective filter feature selection is a challenge and the exploration of this niche area with new optimization techniques is promising if we consider the benefits of multi-objective approaches. In the first contribution of this dissertation, we design and implement an efficient Memetic Algorithm for Multi-objective (α; β)-k-Feature Set Problem (MOMA-ABK). The (α; β)-k-Feature Set Problem (ABkFS) aims to find a subset of features able to “cover” times each pair of samples with different class values and each pair of samples with the same class value “covered” β times. We use a multi-objective optimization approach mainly because is unknown the relationship between α, β, and the number of features. Additionally, we improved the performance of our algorithm by including information during the optimization process. We considered information from the relationship between features by applying clustering techniques between the features and storing features efficiently on a search tree structure. We experiment with six real-life datasets and our results shown that the use of the search tree structure improves the performance of the algorithm. Considering the challenging area of analyzing high-dimensional datasets, our second contribution is a novel multi-objective (MO) filter feature selection algorithm. We proposed a filter feature selection methodology based on the (α; β)-k-Feature Set Problem composed by four stages: preprocessing, MOMA-ABK, classification and postprocessing. In order to integrate several Pareto front into one set of representative solutions, we proposed and implemented three novel approaches. In addition, we studied the impact in the performance of the filter feature selection approach of the α value considered during the optimization process. Our experiments have shown that our approach has competitive performance in comparison with state-of-the-art algorithms.
- Subject
- Feature Selection (FS); k-Feature Set Problem (k-FS); α; β)-k-Feature Set Problem (ABkFS); data integration
- Identifier
- http://hdl.handle.net/1959.13/1400428
- Identifier
- uon:34769
- Rights
- Copyright 2019 Francia Jimenez
- Language
- eng
- Full Text
- Hits: 1497
- Visitors: 1914
- Downloads: 584
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 7 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 503 KB | Adobe Acrobat PDF | View Details Download |